Clustering based on Dissimilarity First Derivatives

نویسنده

  • Ana L. N. Fred
چکیده

A hierarchical agglomerative clustering algorithm based on the analysis of dissimilarity increments between neighboring patterns is presented. The first derivative of dissimilarity between neighboring patterns inside a natural cluster is modelled by an exponential distribution, this statistic characterizing the cluster. A cluster isolation criterion is defined based on estimates of each cluster dissimilarity increments mean value, continuously updated along the clusters formation process, under a hierarchical agglomerative framework. Unreliable estimates, mainly occurring when cluster cardinality is low, can lead to over-fragmentation of the data into spurious, small sized clusters. In order to prevent this situation, a regularizing function is proposed to widen the estimates of the exponential distribution mean, when the number of samples is small. Analysis of the method is performed in a comparative study with the well known single-link and k-means algorithms. Application examples using both syntectic and real data show the ability of the method to identify arbitrary shaped clusters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

ارتقای کیفیت دسته‌بندی متون با استفاده از کمیته‌ دسته‌بند دو سطحی

Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...

متن کامل

On Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering

Hybrid clustering combines partitional and hierarchical clustering for computational effectiveness and versatility in cluster shape. In such clustering, a dissimilarity measure plays a crucial role in the hierarchical merging. The dissimilarity measure has great impact on the final clustering, and data-independent properties are needed to choose the right dissimilarity measure for the problem a...

متن کامل

Clustering with Intelligent Linexk-Means

The intelligent LINEX k-means clustering is a generalization of the k-means clustering so that the number of clusters and their related centroid can be determined while the LINEX loss function is considered as the dissimilarity measure. Therefore, the selection of the centers in each cluster is not randomly. Choosing the LINEX dissimilarity measure helps the researcher to overestimate or undere...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002